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Abstract. Let be a probability measure on K with cumulative distribution 
function F, (x^)" a large i.i.d. sample from ^, and _F„ the associated empirical 
distribution function. The Glivenko-Cantelli theorem states that with proba- 
bility 1, Fn converges uniformly to F. In so doing it describes the macroscopic 
structure of {xi\", however it is insensitive to the position of individual points. 
Indeed any subset of o(n) points can be perturbed at will without disturbing 
the convergence. 

We provide several refinements of the Glivenko-Cantelli theorem which are 
sensitive not only to the global structure of the sample but also to individual 
points. Our main result provides conditions that guarantee simultaneous 
concentration of all order statistics. The example of main interest is the 
normal distribution. 



1. Introduction 

Let /.t be a probability measure on M with cumulative distribution function F 
and let {xi)^ denote an i.i.d. sequence of random variables with distribution /z. 
For each n £ N let Fn denote the empirical cumulative distribution function 

Fn{t) = -\{i G N : z < n, X, < 01 
n 

where \A\ denotes the cardinality of a set A. The Glivenko-Cantelli theorem (see 
e.g. [8]) states that with probability 1, 

lim sup|F(t) - F„(i)| = 

The Dvoretzky-Kiefer-Wolfowitz inequality (|9] and |17| ) provides a quantitative 
formulation of this and states that for all n G N and all A > 0, with probability at 
least 1 - 2exp(-2A2), 

sn^^/li\F{t)- Fn(t)\ < A 
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This titanic theorem would be weh deserving of the name 'the fundamental theorem 
of statistics^ as it is the theoretical foundation behind the idea that a large inde- 
pendent sample is representative of the population. There is, however, a certain 
crudeness in this noble theorem. Asymptotically, individual points play a negli- 
gible role and we learn very little about the finer structure of the sample {xi}". 
For instance, it gives us almost no information about either the maximum or the 
minimum. We could take any subset of o{n) points and perturb them as we please 
without affecting the convergence. 

Donsker's theorem (see e.g. [7]j [14] and |16p gives more insight into the 
structure of the sample. Consider the stochastic process Xn defined on M by 

Xn{t) = V^iFnit) - F{t)) 

Provided that F is strictly increasing and continuous, Xn converges to a re-scaled 
Brownian bridge (more precisely, Xn ° F''^ converges to a Brownian bridge on 
[0, 1]). However Donsker's theorem is plagued by a similar insensitivity to the cries 
of the minority. Through the eyes of Donsker's theorem, we can 'see' subsets as 
small as y/n but are blind to anything smaller such as subsets of size \og{n). 

In this paper we provide refined forms of the Glivenko-Cantelli theorem which, 
under certain conditions, guarantee tight control over all or most points in the 
sample, not only individually but simultaneously. Super-exponential decay of the 
distribution provides simultaneous concentration of all order statistics (see theo- 
rem 1) while exponential decay provides simultaneous concentration of most order 
statistics and slightly weaker control over the rest (see theorems 2 and 3). We 
provide quantitative bounds for log-concave distributions (see theorem 4). 

Our results extend the Gnedenko law of large numbers, which guarantees con- 
centration of maxja::;}". They may be compared to the results in [10 where the 
Gnedenko law of large numbers is extended to the multi-dimensional setting, to the 
paper [I3j that provides estimates of order statistics in terms of Orlicz functions 
and to the article [l] that concerns optimal matchings of random points uniformly 
distributed within the unit square. We refer the reader to [llj and jl9 for an 
extensive treatment of empirical process theory and to [2], [4] and |18j for infor- 
mation on order statistics. Interesting papers on the Glivenko-Cantelli theorem 
include [5], [20], [21] and [22]. 

Theorem 1. Let fi be any probability measure on R with a continuous strictly 
increasing cumulative distribution function F such that for all e > 

(1.1) limi^ili±£l. hm 

^ ^ t^oo 1-F{t) t^-ooF{t + e) 

Then there exists a sequence (<^n)i° vjith lim„_j.oo (^n = such that for all n G N, 
if {xi)i is an i.i.d. sample from fj, with corresponding order statistics (a;(i))", then 
with probability at least 1 — (5„, 

(1.2) sup \x(i) - a;(,)| < Sn 



l<i<T 



where x*^^ — F ^{i/{n+l)). 
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Theorem 2. Let jj, be any probability measure on M with a continuous strictly 
increasing cumulative distribution function F such that for all s > 

(1.3) iimsupi^^^|±^ < 1 

(1.4) limsup-^^ < 1 

Let (w„)f° be any sequence in N with lini„^oo '^n = oo. Then there exists a 
sequence (5n)i° with lim„^oo(5„ = 0, such that for all n gN, if (xj)" is an i.i.d. 
sample from ji with corresponding order statistics (a;(i))", then with probability at 
least 1 — 6n, 

sup |a;(i) - a;(.)| < (5„ 

where x^^^ = F~^{i/{n + 1)). 

Theorem 3. Let /i be any probability measure on R that obeys the conditions of 
theorem 2. Then there exists k > such that for all T > 10^ and all n € N, if 
{xi)i is an i.i.d. sample from ^ with corresponding order statistics (a;(i))", then 
with probability at least 1 - 400T~^/^, 

sup \x(^i) - a;(i)| < kT 

l<i<n 

Note that in theorem 2 we can take (Wn)i° to grow arbitrarily slowly, for exam- 
ple let uJn = log log log n. Wc thus have tight control over almost the entire data 
set with the exception of a very small proportion of points. This is substantially 
better than the ^/n 'visibility' of Donsker's theorem. 

A probability measure fi is called p-log-concave for some p G (O.oo) if it has 
a density function of the form f{x) = cexp(—g(x)P) where g is non-negative and 
convex. The 1-log-concave distributions are simply referred to as log-concave. If 
Id is j>-log-concave then it is also g-log-concave for all 1 < q < p. 

Theorem 4. Let p > 1 , q > and let fx be a p-log- concave probability measure on 
R with a continuous strictly increasing cumulative distribution function F. Then 
there exists c > such that for any n e N and any i.i.d. sample (ajj)" from /i with 
order statistics (a;(j))", with probability at least 1 — c(logn)~', 

I * I ^ loglogn 

sup a;(j) - x,-s \ < c- —j- 

i<i<n ^ ' (logn)i VP 

where x*^ = F~^{i/{n + 1)). 

The main idea behind the proof of these theorems is to first analyze the uniform 
distribution on [0, 1]. Wc do this using a powerful representation of the empirical 
point process via independent random variables that allows us to use classical results 
such as the law of large numbers (in the form of Chebyshev's inequality) and the 
law of the iterated logarithm. A key step in this analysis is to exploit the inherent 
regularity of order statistics which allows for control over all points based on an 
inspection of merely logn carefully chosen points. We then transform the points 
under the action of F"^ to analyze the general case. We introduce a new class of 
metrics on (0, 1) defined by 

(1.5) yp(a;,2/)-max|^j^^_^_^^^_^/^, (iog(i _ | 
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for 1 < p < oo and 0<x<y<l. To see that each 6p is indeed a metric, 
note that 9p(x,y) is decreasing in x and increasing in y throughout the triangular 
region {{x,y) G (0, 1)^ : x < y}. We show that is either Lipschitz or uniformly 
continuous with respect to these metrics (depending on the assumptions imposed 
on /i). After this, our main results become straightforward to prove. 

There are endless variations on the main theme of this paper. Our intention 
is simply to highlight a phenomenon and introduce methods by which to study it. 
Note that our results are purely asymptotic in nature and we can (and do) assume 
throughout the paper that n > uq for some uq G N. 




2. The uniform distribution 

Let (7i)i denote an i.i.d. sample from the uniform distribution on [0, 1] with 
corresponding order statistics (7(i))" and let (zi)"^^ be an i.i.d. sequence of random 
variables that follow the standard exponential distribution. For 1 < i < n define 

-1 

f i \ f n+1 \ 

It is of great interest to us that (y^)" and (7(i))i have the same distribution in 
M" (see chapter 5 in [6]). This is nothing but an expression of the fact that the 
empirical point process locally resembles the Poisson point process. Also of interest 
is the fact that these random vectors have the same distribution as the partial sums 
of a random vector uniformly distributed (with respect to Lebesgue measure) in 
the standard simplex A" — {w G R"+^ : Wi > Vi, J^i'^i — !}• The power of 
this representation is that we have an expression for (7(i))i in terms of independent 
random variables. Note that 

-1 

I ^ i \ I -i n-\-l \ 

(2.1) yz 

Both lemma 1 and lemma 3 below can be compared to the results in [23'. 

Lemma 1. Let T > 10^ and n G N. With probability at least 1 - AOOT-^/^ the 
following inequalities hold simultaneously for all 1 < i < n, 

(2.2) T- < 7« (^) ' < T 

(2.3) r-i<(i_^(^))(^i__^^ <T 

Proof. Let Q = 2^^T^/^ and momentarily fix 1 < i < n + 1. The random 
variable X]}=i mean 1 and variance i"^ . Using Chebyshev's inequality, 

with probability at least 1 — i~^Q~^ we have 




1 

-Q < 1 - - < Q 



The random variable 

U^^\{]Gn■.J<i,z,<2Q-^}\ 
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follows a binomial distribution with i trials and success probability 1— exp(— 2Q^^) < 
2Q^^. Using Chebyshev's inequality again, with probability at least 1 ~ 32i^^Q~^ 
we have Ui < i/2, which implies that i^^ X]}=i > Q^- Hence, with probability 
at least 1 — 33* "^Q"""^ we have 



(2.4) Q-i < i ^ < Q 



Let M = \\og^{n)\. With probability at least 1 - 33g-i X^^lo 2"^' - 33(n + 
l)-iQ-i > 1- lOOQ^^equation holds simultaneously for « = 1, 2, 2^, 2^ . . . 2*^ 
and for i = n + 1. Hence, by (I2.ip . with probability at least 1 — 100Q~^ we have 
that for all such i 



2 n + l - " n+l 
Since {yi)i is an increasing sequence, control over the values {y2i)jLi leads to 
control over the entire sequence and, recalling the representation of {'J{i))i in terms 
of (yi)?, the bound jM]) follows for alll < i < n. The bound then follows by 
symmetry. □ 

Lemma 2. Let t E (0, 1) and n E N. With probability at least 1 — 2exp(— 
the following inequality holds simultaneously for all 1 < i < n, 



(2.5) 



1 



< t 



Proof. We can assume without loss of generality that n^^ < 2t/3 (otherwise 
the probability bound becomes trivial). Note that since our sample is taken from 
the uniform distribution we have 

sup |7(i) - i(n + 1)"^| < + sup |7(i)-m"^| 

l<z<n l<i<n 



= n 



-1 



+ sup \F,,{t) ~ F{t)\ 



0<t<l 



where F{t) = t is the cumulative distribution function and Fn is the empirical 
distribution function. By the Dvoretzky-Kiefer-Wolfowitz inequality (as mentioned 
in the introduction), with probability at least 1 — 2exp{—5~^nt^) we have 

sup \Fr,{t)~F{t)\<t/3 
0<t<l 

and the result follows. □ 

Note that in the preceding proof one can also use Doob's martingale inequality 
(in the form of Kolmogorov's inequality) and the representation of (7(i))i in terms 
of (?/n)i , although this approach yields an inferior probability bound. 

Lemma 3. Let (w„)^ be any sequence in N such that lim„_j.oo w„ — oo. Then for 
all T > 1 and all S E (0, 1) there exists no G N such that for all n > hq, if (7(i))i 
are the order statistics from an i.i.d. sample from the uniform distribution on [0, 1], 
then with probability at least I — S, and \2. 3]) hold for w„ < i < n — a;„. 

Proof. We use the representation (12. ip . Let T > 1 and 5 E (0, 1) be given. 
Without loss of generality we may assume that T < 2. Let {zi)f^ denote any i.i.d. 
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sequence of random variables that follow the standard exponential distribution. 
Define the deterministic sequence (Aj)j^ as follows, 



Xj = P{sup(2iloglogi)"i/2 



k=l 



<2} 



Note that is an increasing sequence and by the law of the iterated logarithm, 

we have the following inequalities, 



lim^^oo Aj = 1. Fix no € N with no > 6'iS^^{T^/'^ - 1)"^ such that for all n > hq 



A„(„) > 1 - (5/4 



Slog log UJr, 



1/2 



Now consider any n > uq and let (7(i))i denote the order statistics mentioned in 
the statement of the lemma. With probability at least 1 — (5/4, for all uj{n) < i < n, 



1 ' 



< 



8 log log UJn 



1/2 



By Chebyshev's inequality and the fact that the function u i-^ u ^ is 4-Lipschitz 
on [1/2, cx)), with probability at least 1 - 16n-^{T^^'^ - l)-^ > l _ 5/4 




< 



By ((2A|) . with probability at least 1 - 6/2, ((2?2t holds for ah w(7i) < i < n. By 
symmetry, with the same probability ()2.3p holds for all 1 < z < n — Lu(n). The 
lemma is thus proven. □ 



3. The general case 

Lemma 4. Let F be a continuous strictly increasing cumulative distribution func- 
tion that satisfies Then F^^ is continuous and for all T > 1 and all S > 
there exists rj G (0, 1) such that for all x,y ^ (0,^?) with T^^ < xy^^ < T and all 
x,y e (1 -?7,1) with T'^ < (1 - x){l - y^^ <T we have \F-^{x) ~ F-'^{y)\ < 6. 

Proof. Consider any T > 1 and (5 > 0. By p.ll) there exists tg gR such that 
for aU t < to, TF{t) < F{t + S). Let r/i = F{to). Consider any x,y e (0, r/i) such 
that < xy~^ < T. Without loss of generality, x < y. Let s = F~^{x) and 
t = Then s < to, hence F{t) ^y <Tx = TF{s) < F{s + S), from which 

it follows that t < s + S and that \F~^{x) — < S. Analysis of the right 

hand tail is identical and provides us with 772 > such that for all x,y d (1 — r/2, 1) 
with < (1 - - y)-^ < T we have \F-^{x) - F-^{y)\ < d. The result 
follows with 77 = min{77i, 772}- D 
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Lemma 5. Let F be a continuous strictly increasing cumulative distribution func- 
tion that satisfies both U.3]) and Jj.^l ). Then F^^ is continuous and for all S > 
there exists T > 1 such that for all x, y € (0, 1) such that < xy~^ < T and 
T^^ < (1 — a;)(l — y)~^ < T we have \F~^{x) — F~^(jj)\ < S. In particular, F~^ is 
uniformly continuous with respect to the metric 9i (see lll.5\) ). 

Proof. Consider any S > 0. By ()1.4p there exists Ti > 1 and to G R such 
that for aU t < to, TiF{t) < F{t + 6). Let 771 = min{F(to), 2"^. As in the proof 
of the previous lemma, it follows that for all x,y £ (0, with T^^^ < xy~^ < Ti 
we have \F^^{x) — F^^{y)\ < S. Similarly (using ()1.3p ). there exists T2 > I and 
772 e (2-\l) such that for all x,y e (772,1) with T^^ < (1 - x){l - y)'^ < T2 
we have |F~^(x) — F~^{y)\ < 6. By continuity of F^^ relative to the standard 
topology on (0, 1), and by compactness of [2~^?7i, 1 — 2^^772] there exists < 5' < 
10~^ min{77i, 772} such that for all x,y G [2~^77i, 1 — 2~^7;2] with |a; — ?/| < S' we have 
\F~^{x) — F~^{y)\ < 6. We leave it to the reader to verify that the result holds 
with 

T = min{ri,r2,l + (5'} 

□ 

Proof of theorem 1. We shall construct a function h that takes an arbi- 
trary S e (0,1) and produces an appropriate uq = h{S) £ N. Then, using this 
function we shall define the desired sequence {Sn)f that is mentioned in the state- 
ment of the theorem. To this end, let 6 € (0, 1) be given. Define 

(3.1) r=: 10*5 (5^2 

By lemma 4 there exists 77 e (0, 1) such that ii x,y £ (0, r/) and T^^ < xy^^ < T, 
or x,y e {I- r/, 1) and T^^ < (1 - a;)(l - y)^^ < T, then \F^'^{x) - F'^{y)\ < S. 
By compactness, F^^ is uniformly continuous on [77/2, 1 — r]/2], which implies the 
existence of i S (0,77/2) such that ii x,y G [?7/2,l — 77/2] and |a; — 7/| < t, then 
\F-^{x) ~ F-^{y)\ < d. Define 

(3.2) no ^ \5t-Hog{A6~^)] 

and consider any n > Uq- Let (7(i))i denote the order statistics corresponding to 
an i.i.d. sample from the uniform distribution on [0,1]. Note that we have the 
representation 

(3.3) =F-i(7(,)) 

valid for all 1 < f < n. By lemmas 1 and 2, as well as equations (13. ip and p. 21) . 
with probability at least 1 — 5 inequalities (12.21) , (|2.3|) and (|2.5I) hold simultaneously 
for all 1 < 7 < 77. Suppose that these inequalities do indeed hold and consider any 
fixed 1 < 7 < 77. Since t < 77/2, one of the three sets [0,77], [77/2, 1 — 7//2] and [1 — 77, 1] 
contains both 7(j) and 7(77+!) which implies that |F~^(7(j')) — (7(71+1) ^-'^)| < 
6, which is inequality (|1.2I) . 

Define the non-decreasing sequence (tn)i° by Kn = max{/i(e^') : 1 < 7 < n} 
and set 

Sn — exp(— maxji G N : < 77}) 
where we define max0 — 0. It is clear that lim„^oo Sn — 0. Consider any fixed 
71 G N. If {7 G N : Ki < n} = then the probability bound is trivial, otherwise let 
j = max{7 G N : Ki < n}. The result follows by the inequality h{6n) = h{e^^) < 
Kj < n and by definition of the function h. □ 
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Proof of theorems 2 and 3. The proof is very similar to that of theorem 
1. We use the representation p.3|) . The main difference is that we use lemmas 3 
and 5 instead of lemmas 1 and 4. The details are left to the reader. □ 



4. Log-concave distributions 

The following two lemmas are modifications of lemmas 6 and 9 in [lOj . 

Lemma 6. Let ^ be a log-concave probability measure on M with a continuous 

strictly increasing cumulative distribution function F. Then there exists c > 
such that for all < x < y < 1, 
(4.1) 

\F-Hy) - F-H.)\ < cmax||F-(,)| -^mi- y)) 

[' logy log(l - x) 

Proof. By theorem 5.1 in ^15: (see lemma 5 in [10 for a proof) F is log- 
concave. Hence the function u{t) = — logi^(i) is convex (and strictly decreasing). 
Let Kfi denote the centroid of /i (the expected value of a random variable with 
distribution /i). By lemma 5.12 in |15| (see also lemma 3.3 in [3|) F(E/i) > e~^, 
hence M(E/i) < 1. By convexity of u we have the inequality (t — s)~^ {u{t) — u{s)) < 
(E/x - t)~^(u(E^) - u{t)), which is valid for all s < t < Efi. Let < x < y < 
min{e~2^F(0),F(~2E/x)} and define s = F-'^{x) and t = F~'^{y). Then we have 

F-\y)-F-\x)<{E^,~F-\y))- ^"^l^^'^^ 



logy 1 - u(E^) 



It follows from the restrictions on y that F~^{y) < and that |F^^(?/)| > 2 |E//|. 
Since y < F(E/i)^, it follows that logy"^ > 2M(E/i) and (|4.ip follows for such x and 
y with c — 4. For other values of x and y, inequality (|4.ip follows by compactness, 
continuity and symmetry. □ 

Lemma 7. Let p > 1 and let ^ be a p-log-concave probability measure on R with 
cumulative distribution function F. Then there exists c > such that for all 

(0,1), 

(4.2) \F-^{x)\ < cmax{{\ogx-^y/P, (log(l - xy^Y^P} 

As a consequence of ^4-^ J^.Jp , F~^ is Lipschitz with respect to the metric Op 

(see Jirm ). 

Proof. By lemma 9 in [lOj (which holds for p > \) there exists ci,C2 > 
and to > 1 such that for all t < -to, F{t) < ci\t\^^P exp{~C2\t\P). Let 771 = 
min{i^(— to), C]~^} and consider any x £ (0,771). Let t = F^^{x). Hence x — 
F{t) < ci\t\'^'Pexp{-C2\t\P), which implies that 

\F~\x)\ = t 

< (C2-I(l0gci+I0gx-I))l/P 

< 2'/Pc-'^''ilogx-')'/P 

The result now follows by symmetry, compactness and continuity. □ 
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Lemma 8. Let F be a continuous strictly increasing cumulative distribution func- 
tion associated to a log-concave probability measure. Then there exists c > such 
that for all e G (0, 1/2) and all x,y Cz [e, 1 — e], 

\F-\x)~F-\y)\<ce-'\x-y\ 

Proof. This follows from lemmas 6 and 7 with p = 1 and the inequality 
logt<i-l. □ 

Proof of theorem 4. By lemmas 1, 6 and 7, with probability at least 1 — 
400(log7i)^'', for all i < r?l^ and alH > rt — r?^'^ we have 

I * I ^ loglogn 

I^W'WI -'=(logn)i-i/P 

Let / — [2^^n^^/^ , 1 — 2~^n~^/'*]. By lemma 8, for all x,y & I we have 

\F'\x)~F-\y)\<cn'/'\x-y\ 

By lemma 2, with probability at least 1 — 2 exp(— 5n^/^), for all 1 < « < n we 
have 

|7(,)-i(n+l)-i| <n-3/8 

Hence for all n^^^ < i < n ~ t?!'^ both 7(j-) and i{n + 1)^^ are elements of / 
and the result follows. □ 
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